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During  the  first  half  of  the  present  contract  year,  the  program 
initiated  or  continued  the  following  studies:  Speech  analysis  by  linear 

prediction,  reconstruction  of  multidimensional  signals  from  pro¬ 
jections,  applications  of  digital  frequency  warping,  and  development 
of  a  digital  speech  synthesizer.  A  review  was  also  made  oi  design 
and  synthesis  of  digital  filters  within  the  constraints  of  finite  regis¬ 
ter  length.  These  projects  are  summarized  in  the  following  pages. 
Reprints  of  available  publications  are  appended. 

The  views  and  conclusions  contained  in  this  document  are  those  of 
the  authors  and  should  not  be  interpreted  as  necessarily  representing 
the  official  policies,  either  expressed  or  implied,  of  the  Advanced  Re¬ 
search  Projects  Agency  or  the  U.  S.  Government. 
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1.  Speech  Analysis  by  Linear  Prediction 

Recently,  the  analysis  of  speech  by  means  of  a  technique  referred  to 
as  linear  prediction  has  received  considerable  attention.  This  technique 
is  directed  toward  modeling  a  sequence  as  the  output  of  an  all -pole  digital 
filter.  The  work  carried  out  under  this  research  contract  is  directed  toward 
applying  the  techniques  of  linear  prediction  to  the  extraction  of  param¬ 
eters  for  automatic  speech  recognition.  Furthermore,  we  are  at  present 
investigating  a  number  of  alternative  formulations  of  the  technique  together 
with  some  of  the  theoretical  limitations. 

A  portion  of  the  work  relating  to  the  extraction  of  parameters  fer  speech 
recognition  is  being  carried  out  on  the  fast  digital  processor  facility  at 
Lincoln  Laboratory.  The  system  implemented  is  capable  of  performing 
the  analysis  in  real  time,  and  effort  is  now  directed  toward)  the  evaluation 
of  its  performance.  Thus  far  the  results  arc  encouraging,  indicating  that 
with  the  use  of  linear  prediction  good  speech  parameter  data  can  be 
extracted.  This  aspect  of  the  work  is  reported  in  detail  by  V.  W.  Zue, 
Quarterly  Progress  Report  No.  105,  Research  Laboratory  of  Electronics, 
M.I.T.,  April  1972,  pp.  133-142. 

A  second  aspect  of  this  work  relates  to  some  possible  theoretical  short¬ 
comings  of  the  technique.  In  particular,  it  has  been  shown  that  in  certain 
situations  it  is  possible  for  the  linear  prediction  technique  to  generate 
approximations  to  the  speech  spectrum  with  large  errors.  These  results, 
and  a  comparison  of  a  number  of  formulations  of  the  linear  prediction 
technique,  have  been  summarized  by  M.  R.  Portnoff,  V.  W.  Zue  and 
A.  V.  Oppenheim  in  Quarterly  Progress  Report  No.  106,  Research  Lab¬ 
oratory  of  Electronics,  M.  I,  T, ,  July  1972,  pp.  141-150. 

Based  on  the  results  obtained  so  far,  the  linear  prediction  technique 
for  speech  analysis  appears  to  be  extremely  promising,  but  it  hns  some 
theoretical  pitfalls.  Research  under  this  contract  on  these  problems  is 
continuing. 

2.  Reconstruction  of  Multidimensional  Signals  from  Projection 

In  a  variety  of  contexts,  projections  of  multidimensional  signals  are  ^ 
available,  and  a  reconstruction  of  the  original  signal  is  desired.  The  basis 
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for  the  technique  is  that  the  Fourier  transform  of  the  projection  of  a  sig¬ 
nal  ean  be  shown  to  be  a  slice  through  the  Fourier  transform  of  the  orig¬ 
inal  signal.  The  problem  of  reconstructing  multidimensional  signals 
from  their  projections  is  encountered  naturally  in  a  variety  of  contexts 
ineluding  x-ray  photographs  and  electron  mierography.  The  purpose 
of  the  present  research  is  to  formulate  the  reconstruction  problem  entirely 
in  terms  of  discrete  signals.  We  approaehed  the  problem  from  this  point 
of  view,  and  have  obtained  some  particularly  interesting  results.  It  has 
been  shown  that  it  is  possible  under  relatively  mild  assumptions  to  com¬ 
pletely  define  a  multidimensional  signal  in  terms  of  a  single  one- 
dimensional  projection.  In  addition,  a  number  of  algorithms  have  been 
devised  for  reconstructing  multidimensional  signals  from  a  small  num¬ 
ber  of  projections.  These  results  have  potential  application  to  a  number 
of  problems.  The  possibility  is  suggested  that  bandwidth  compression  of 
multidimensional  signals  ean  be  accomplished  by  coding  in  terms  of  pro¬ 
jection.  Furthermore,  the  use  of  projections  to  describe  a  multidimen¬ 
sional  signal  appears  to  hold  some  promise  for  carrying  out  the  design 
of  multidimensional  filters.  The  theoretical  basis  for  these  algorithms 
and  some  examples  are  described  by  It.  M.  Mersereau  in  Quarterly 
Progress  Report  No.  105,  April  1972,  Research  Laboratory  of  Electronics, 
M.  I.  T. ,  pp.  169-183. 

3.  Applications  of  Digital  Frequency  Warping 

Recently,  a  technique  was  proposed  for  processing  a  signal  in  such  a 
way  as  to  implement  a  nonlinear  distortion  in  the  frequency  axis.  We 
are  investigating  application  of  this  digital  frequency  warping  to  a  num¬ 
ber  of  problems,  particularly  the  implementation  of  unequal  resolution 
speetrum  analysis.  We  have  been  investigating  the  approximation  to  con¬ 
stant  percentage  bandwidth  that  ean  be  achieved  using  this  technique  in 
conjunction  with  the  fast  Fourier  transform  algorithm.  We  have  also 
been  investigating  the  application  of  this  technique  to  Vernier  speetrum 
analysis.  If  small  errors  are  allowable,  it  is  possible  to  use  digital 
frequency  warping  for  approximately  constant  percentage  bandwidth  fre¬ 
quency  analysis,  and  it  appears  that  the  technique  will  be  applicable  to 
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Vernier  spectrum  analysis.  The  need  for  both  unequal  resolution  and 
Vernier  spectral  analysis  arises  in  a  variety  of  contexts,  including  radar 
and  sonar  processing.  We  anticipate  that  the  present  research  will  have 
application  to  those  problems.  The  details  of  the  technique  and  some  pre- 
iminary  results  are  described  by  A.  V.  Oppenheim  and  D.  H.  Johnson  in 
'Discrete  Representation  of  Signals,"  Proc.  IEEE  60,  681-691  (1972). 

4.  Development  of  a  Digital  Speech  Synthesizer 

We  are  working  on  the  design  and  fabrication  of  a  small,  fast,  inexpen - 
sive  digital  processor  to  be  used  primarily  for  speech  synthesis,  but  with 
application  to  more  general  signal -processing  tasks.  At  present,  a 
detailed  design  of  this  processor  has  been  made  and  hardware  con¬ 
struction  will  begin  shortly.  A  central  component  of  the  processor  is  a 
new  high-speed  multiplier  that  has  been  designed  with  partial  support 
from  this  contract.  This  multiplier  is  described  by  J.  Allen  and  E  R 
Jensen  in  Quarterly  Progress  Report  No.  105,  Research  Laboratory  of 
Electronics,  April  1972,  pp.  147-152. 

The  synthesizer,  when  completed,  will  be  connected  to  the  PDP-9 
computer.  It  will  operate  as  a  real-time  synthesizer  and  will  play  an 
important  role  in  a  number  of  future  research  projects  including  speech 
bandwidth  compression,  speech  analysis,  and  speech  synthesis  by  rule. 

5.  Design  and  Synthesis  of  Digital  Filters  within  the  Constraints 

of  Finite  Register  Length 

Although  in  most  cases  the  design  of  design  of  digital  filters  is 
carried  out  without  regard  to  finite  register  length,  they  must  be 
implemented  with  finite  register  length.  The  effects  of  finite  register 
length  manifest  themselves  in  a  number  of  ways,  including  parameter 
inaccuracies  and  truncation  or  rounding  after  arithmetic  operations. 

Recently,  a  review  of  these  effects  was  carried  out.  These  results 
are  discussed  in  detail  in  an  invited  paper  by  A.  Oppenheim  and  C 
Weinstein  entitled  "Effects  of  finite  register  length  in  digital  filtering 
and  the  fast  Fourier  transform"  published  in  Proc.  IEEE  60  (1972). 

During  the  coming  year,  new  research  on  these  effects  and  the  synthesis 
of  digital  filters  will  be  pursued. 
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A.  NEW  HIGH-SPEED  MULTIPLIER  DESIGN 

With  the  advent  of  MSI  and  LSI  integrated  circuit  technology,  there  is  no  doubt  that 
digital  multipliers  of  very  high  speed  can  be  achieved,  once  it  is  agreed  what  should 
be  incorporated  in  these  chips.1  In  the  meantime,  we  can  achieve  a  very  fast  design  by 
simultaneously  exploiting  the  mathematical  structure  of  binary  two's -complement  mul¬ 
tiplication  and  existing  MSI  circuits  which  can  be  adapted  in  a  natural  way  to  the  struc¬ 
ture  of  this  task.  Accordingly,  we  shall  show  that  the  expression  for  binary  multiplication 
can  be  rewritten  to  suggest  use  of  the  74181  Arithmetic  Logic  Unit  (ALU)  in  a  straight¬ 
forward  way  that  achieves  high  speed,  simple  layout,  and  very  little  logic  external  to 
the  ALU  array. 

In  order  to  display  the  desired  structure  of  multiplication,  we  shall  consider  the 
multiplication  of  two  4-bit  two's-complement  numbers.  Let  each  such  number  be  repre¬ 
sented  as 


This  work  was  supported  by  the  National  Institutes  of  Health  (Grant  5  POl 
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Z  S  Z3Z2Z1Z0  =  ~^*z3  +  zZz2  +  zlz]  +  2%. 
where  each  z{  is  either  1  or  0,  so  that  the  product  of  two  such  numbers  is 

XY  =  26x3y3  “  25(x3y2+X2y3)  +  24("X3yi+X2y2_Xly3) 

+  23(-x3VX2yl+xly2-V3) 

+  ^WVl+W  +  2l<xlWl)  +  2Vo- 

This  sum  is  commonly  arranged  in  an  array  in  which  each  column  contains  factors  of 
like  powers  of  2.  as  in  Fig.  XI-!.  The  factors  can  be  further  rearranged,  as 
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shown  in  Fig.  XI-2,  in  order  to  group  positive  and  negative  terms.  At  this  stage,  each 
row  has  a  constant  factor  for  each  term  buch  as  for  the  top  row,  and  these  can  be 
factored  out  as  bits  that  control  the  conditional  inclusion  of  a  given  row  in  the  final  sum. 
Thus,  in  Fig.  XI -3,  the  top  row  (x£  Xj  xq)  will  be  added  Into  the  sunt  Just  In  case 
yo=  !•  an^  similarly  for  the  other  rows.  Figure  XI*3  also  shows  6  conditional 
terms  to  be  summed,  but  one  of  these,  x^y^,  affects  only  the  most  significant  bit 
position.  If  this  term  is  included  in  any  other  row  (say  row  6),  the  only  change 


(«a  ♦  »)  ♦  C)  •  (D  ♦  t)) 


Fig.  XI -4.  Illustrating  the  parallel  nature  of  the  multiplication  task. 
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in  the  final  result  might  be  a  earry  out  of  the  most  significant  bit  V'hen  sign  exten¬ 
sion  is  not  desired,  it  is  thus  possible  to  incorporate  the  x^y^  term  in  row  6,  and 
we  are  left  with  just  5  rows  to  sum.  It  is  this  representation  of  the  product  as  a 
sum  of  conditional  terms  that  ean  be  exploited  by  the  ALU  design. 

Having  rewritten  the  produet  as  a  sum  of  terms,  eaeh  conditioned  on  a  control 
bit,  the  next  step  is  to  minimize  the  sum -and-carry  delays  by  exploiting  the  inher¬ 
ent  parallelism  of  the  array.  Referring  to  the  five  rows  as  A  through  E,  Fig.  XI-4 
shows  how  a  tree  structure  permits  simultaneous  sums  to  be  computed,  rather  than 
performing  eaeh  indicated  sum  in  serial,  left-to-right  order.  In  Fig.  XI-4,  eaeh 
box  denotes  an  ALU  adder,  and  we  assume  that  eaeh  add  is  completed  in  A  seconds. 
During  the  first  A  seconds  two  adds  are  completed,  followed  by  one  add  in  eaeh  of 
the  two  succeeding  intervals.  There  are  two  advantages  in  this  scheme.  First, 
the  total  delay  would  be  4A  seconds  if  the  adds  were  done  serially,  but  when  the 
parallelism  is  utilized,  only  3A  seconds  of  delay  result.  More  generally,  for  larger 
sized  numbers  N  bits  long,  a  similar  binary  tree  would  lead  to  (log2  N)A  seconds 
delay,  whereas  a  serial  procedure  would  require  (N-l)A  seconds  delay.  For  N  = 

16,  the  saving  is  A(15-4)  =  11A,  a  very  substantial  figure.  When  N  is  not  a  power 
of  2,  some  branches  of  the  full  binary  tree  are  pruned,  but  the  saving  in  time 
because  of  parallelism  is  still  obtained.  The  second  advantage  is  that  the  binary 
tree  arrangement  ean  be  implemented  in  a  straightforward  and  natural  way  by  using 
the  74181  ALU,  24 -pin  MSI  package. 

The  74181  ALU,  shown  in  Pig.  XI-5,  operatf  s  on  two  4-bit  inputs  in  a  manner 
prescribed  by  the  four  control  bits,  Sq  through  Sy  and  the  Mode  Control  bit  M,  to 
produce  a  single  4 -bit  output.  As  shown  in  Fig.  XI-6,  all  of  the  needed  control 
functions  can  be  realized  by  appropriate  use  of  S  through  S,,  M,  and  C  ,  the  last 

O  i  O 

being  the  input  earry  to  th  least  significant  bit.  Note  that  only  the  double  condi¬ 
tional  sum,  (A  if  z)  +  (13  if  y ) ,  requires  extra  circuitry  to  translate  the  condition 
bits  (z  and  y)  into  ALU  controls,  but  that  this  eireuitry  is  very  simple,  containing 
only  an  XOR  gate  and  an  inverter. 

Figure  Xl-7  shows  the  complete  design  for  a  4  X  4  multiplier,  in  which  the  con¬ 
trol  circuitry  is  shown  in  detail.  Depending  on  the  size  of  the  multiplier  desired, 
extra  time  savings  may  be  realized  by  appropriate  partitioning  of  the  array  and 
insertion  of  carries,  but  the  basic  details  remain  the  same.  The  authors  have 
designed  16  X  16  and  16  X  24  arrays,  which  illustrate  further  refinements.  These 
designs  are  available  to  the  interested  reader. 

A  further  advantage  of  the  ALU  is  its  wide  availability.  Originally,  it  was 
introduced  in  TTL,  but  Schottky  TTI,  and  MECL  10,000  versions  are  now  avail¬ 
able.  Worst -ease  multiplication  times  will  depend  on  which  one  of  these  pack¬ 
ages  is  used,  but  a  16  X  16  design  should  yield  a  completion  time  of  95-100  ns 
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Fig.  XI-5.  Logic  diagram  for  the  74181  Arithmetic  Logic  Unit. 


Fig.  XI-6.  Control  functions  for  the  74181  Arithmetic  Logic  Unit. 
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*1 


y2 


X 

y 


3 

3 


(a) 


CONTROLS  0,2  CONTROL  1 

(b) 

Fig.  XI-7.  Four  by  four  multiplier  block  diagram,  with  external 
control  circuitry. 


in  Schottky  TTL.  This  is  considerably  faster  than  the  performance  obtainable  from 
specialized  multiplier  packages,  such  as  the  Fairchild  9344  or  the  Advanced  Micro 
Devices  AM  2505.  Since  the  ALU  package  has  many  uses,  it  is  relatively  inexpensive, 
particularly  considering  the  resulting  multiplier  speed.  The  package  count,  and  hence 
power,  is  high  (approximately  N(N+l)/4  for  an  NXN  multiply;  for  N  =  16,69  ALU's  were 
required)  but  layout  is  simple,  and  no  other  design  incorporating  standard  commercial 
MSI  packages  has  been  able  to  yield  the  speed  of  this  ALU  array. 

Certainly  faster  or  cheaper  multipliers  have  been  built.  The  ALU  in  a  binary  tree, 
however,  appears  to  be  an  optimal  choice  when  very  high  speed  is  desired  from  standard 
commercial  packages. 

J.  Allen,  E.  R.  Jensen 
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A.  RECENT  ADVANCES  IN  THE  THEORY  OF  RECONSTRUCTING 

MULTIDIMENSIONAL  SIGNALS  FROM  PROJECTIONS 

1 .  Introduction 

The  problem  of  reconstructing  multidimensional  signals  from  their  projections  is 
of  interest  because  x-ray  photographs  and  electron  micrographs  can  be  considered  to 
be  projections  of  three-dimensional  objects.  Thus  mathematical  techniques  for  per¬ 
forming  such  reconstructions  will  permit  us  to  reconstruct  visually  opaque  objects  from 
their  x-rays  at  different  orientations  and  to  determine  the  structure  of  macromolecules 
from  electron  micrographs.  In  a  previous  report1  some  techniques  were  discussed 
whereby  we  could  perform  such  a  reconstruction;  in  the  present  report,  some  other 
more  powerful  algorithms  will  be  developed.  One  of  these  algorithms,  in  fact,  permits 
the  reconstruction  of  a  broad  class  of  multidimensional  signals  of  any  dimensionality 
from  a  single  one -dimensional  projection. 

The  idea  of  reconstructing  functions  from  their  projections  can  be  applied  to  func¬ 
tions  of  any  dimensionality;  however,  the  most  interesting  problems,  since  they  have 
useful  applications,  are  the  two-dimensional  and  the  three-dimensional  problems.  By 
extension,  them  is  a  one -dimensional  problem,  but  it  is  a  trivial  case  because  the  pro¬ 
jection  of  a  one-dimensional  function  is  the  function  itself.  Most  of  the  derivations  in 
this  report  will  be  given  in  terms  of  the  two-dimensional  problem  because  it  is  n  j  ation- 
ally  and  conceptually  simpler  than  the  three-dimensional  problem,  but  we  shall  also 
explore  some  of  the  issues  that  are  unique  to  the  three-dimensional  case. 

In  both  of  the  algorithms  that  are  developed  here  it  is  assumed  that  the  function 
which  is  being  reconstructed  is  baudlimited,  and  if  a  further  assumption  is  made  they 
will  yield  exact  reconstructions.  If  these  assumptions  are  not  appropriate  for  the  prob¬ 
lem  at  hand,  there  are  other  techniques  that  will  yield  approximate  reconstructions.  1_^ 


"This  work  was  supported  by  the  Joint  Services  Electronics  Programs (U.  S.  Army, 
U.S.  Navy,  and  U.S.  Air  Force)  under  Contract  DAAB07-7 1 -C-0300,  by  theU.S. 
Coast  Guard  (Contract  DOT-CG-1344t>-A),  and  by  M.I.T.  Lincoln  Laboratory  Purchase 
Order  CC-570. 
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Inasmuch  as  we  shall  deal  with  bandliniited  functions  exclusively,  it  is  appropriate  to 
begin  with  a  discussion  of  the  properties  of  the  projections  of  multidimensional  band- 
limited  functions. 

-.  Projections  of  Bandliniited  Functions 

The  assumption  of  bandlinntedness  is  not  especially  harsh,  for  although  most  func¬ 
tions  that  we  shall  reconstruct  are  spacelimited  and  hence  strictly  speaking  not  band- 
limited.  they  are  nearly  so.  Furthermore,  if  any  algorithm  is  to  be  implemented  on  a 
computer,  it  is  necessary  to  reconstruct  a  sampled  multidimensional  function  from 
sampled  projections.  Thus  bandlimitedness  is  implicitly  assumed  to  a  greater  or 
lesser'  degree  by  all  digital  reconstruction  algorithms.  In  these  algorithms  we  shall 
explicitly  assume  bandlimitedness  and  then  utilize  this  assumption  in  the  design  of  our 
algorithms,  with  the  hope  that  they  will  yield  high-quality  reconstructions  for  nearly 
bandliniited  functions.  The  last  premise  must  be  verified  experimentally. 


Fig.  XIII- I .  Relationship  between  a  projection  and  a  slice. 

The  projections  of  a  two-dimensional  function  (picture)  can  be  considered  as  a  col¬ 
lection  of  line  integrals  taken  perpendicular  to  an  axis,  which  we  call  the  projection 
axis.  Thus  the  projection  perpendicular  to  the  x  axis.  p()(x),  can  be  defined  as 

p()(x)  =  f(x.  y)  d\ . 

At  a  general  angle  0,  a  projection  can  be  similarly  defined  by 

pfj(u)  -  I  _'y  f(u  •  cos  0  +  v  •  sinO,  -u  •  sin  0  +  v  •  cos  0)  dv,  ( | ) 

and  it  satisfies  the  Fourier  transform  relation 
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|>q(u)  ►  F(<*>eos  0,  u>sin  0),  (2) 

where  F(wx,  .»)  )  represents  the  two-dimensional  Fourier  V  ansform  of  f(x,y).  The 
right-hand  side  of  Eq.  2  will  be  referred  to  as  the  slice  of  the  tw’o-dimensional  Fourier 
transform  at  an  angle  0.  Thus  the  one-dimensional  Fourier  transform  of  the  projection 
of  a  picture  at  an  angle  0  to  the  x  axis  is  a  slice  of  the  two-dimensional  Fourier  trans¬ 
form  of  that  picture  at  an  angle  0  with  the  u>x  axis.  This  relationship  is  illustrated  in 
Fig.  Xlll-1. 


Fig.  XIII-2.  Region  of  Fourier  plane  over 
which  a  bandlimited  picture 
is  nonzero. 


If  we  now  assume  that  the  picture  is  bandlimited,  that  is,  that  its  frequency  response 
is  nonzero  only  in  that  region  of  the  Fourier  plane  illustrated  in  Fig.  XIII -2,  then  we 
can  use  the  sampling  theorem  to  express  the  picture  in  terms  of  its  samples  on  a  regular 
Cartesian  raster  as  in 


f(x,y) 


(3) 


Since  all  of  the  projections  transform  to  slices,  they  to*'  must  be  bandlimited  (in  one 
dimension)  and  each  projection  can  thus  h<-  expanded  in  terms  of  its  samples  as  in 


The  bandwidth  of  each  projection  \VQ  can  be  expressed  as 
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max  {  |  .os  0  | ,  j  sin  6  |  } 

From  Eqs.  4  and  5  we  can  ascertain  the  Nyquist  sampling  rate  for  each  projection, 
which  is  observed  to  be  a  function  of  0,  the  projection  angle.  Since  we  must  work  with 
sampled  projections,  this  will  prove  to  be  an  important  quantity. 

We  can  get  an  alternative  expression  for  pQ(u),  not  in  terms  of  the  samples  of  the 
projections,  but  in  'erms  of  the  samples  of  the  picture  itself.  If  we  take  the  Fourier 
transform  of  Eq.  3,  we  get 


F(w 


U)  ) 

y 


2  v 
=  \ 

,2  L 

m=-« 


W 


on 

\  exP  {”j  W  <m«*>x+nwy  >} 


bWW(wx* 


U)  )  , 

y 


(6) 


where 


bWW(<V 


if  1  w  ■  $  W  and  |  u>  I  s  W 
x  y 

utlH'I’lrjHI 


From  Eq.  6  we  can  evaluate 
Eq.  2). 


F(ujcos0,  wsinG)  which  is  the  expression  for  a  slice  (from 


F(wcos  0,  wsinO) 


^  UD  00 

£  1  1 


in=-oo  n= 


(mcos0  +  nsin0) 


} 


X  bww(wcos  0,  usin  0). 


(7) 


Performing  an  inverse  Fourier  transform  on  Eq. 
tion  at  angle  0. 


7  gives  an  expression  for  the 


projec 


m=-<*>  n=-oo 


sin  we(u"  Wco"  0  -*Wsin  °) 

u  -  ^  cos  0  -  2Z  Sin  o 


(8) 


In  the  two  reconstruction  techniques  that  follow,  we  must  impose  one  further 
restriction  on  the  picture  in  addition  to  bandlimitedness.  We  must  assume  that  the 
digitized  picture  f(^,  be  nonzero  for  integral  values  of  m  and  n  only  when  m 

and  n  are  in  the  range  0  5  m,  n  «  N-l,  for  some  finite  integer  N.  We  call  this 
assumption  quasi-spacclimitedness,  although  note  that  we  do  not  assume  that  f(x,y)  is 
spacelimited  (which  would  contradict  the  assumption  that  it  is  bandlimited),  but  only 
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that  its  samples  are  spacclimited.  This  assumption  has  the  effect  of  making  the  double 
summations  of  Eqs.  3,  6,  7,  and  8  finite.  This,  like  the  assumption  of  bandlimitedness, 
is  implicit  in  most  reconstruction  techniques,  since  only  a  finite  number  of  samples 
of  the  Fourier  transform  of  the  picture  are  generally  computed,  and  only  a  finite 
number  of  picture*  samples  are  reconstructed. 

3.  An  Algorithm  for  Reconstructing  a  Function  from  N+l  Projections 

Equation  4  gives  us  the  smallest  sampling  rate  that  can  be  employed  for  sampling 
a  projection  in  order  that  information  not  be  lost  by  sampling.  Each  projection,  of 
course,  can  be  sampled  with  a  higher  rate.  The  traditional  approach  for  getting 
samples  of  the  sLces  of  a  picture  is  to  find  a  sampling  rate  that  is  large  enough  so  that 
all  of  the  projections  can  be  sampled  at  the  same  rate.  The  resulting  sequences  can 
then  be  aliased  to  give  M  point  sequences,  and  these  M  point  sequences  can  then  be 
Fourier -transformed  by  using  a  discrete  Fourier  transform  (EFT)  algorithm  to  yield 
M  sample  values  along  each  slice.  The  M-point  aliased  sequence  x(n)  corresponding 
to  the  infinitely  long  sequence  x(n)  is  defined  by 

<z 

x(n)  =  N  x(Mra+n), 

L~d 

m=-<« 

If  this  procedure  is  followed,  the  Fourier  transform  of  the  picture  will  be  known 
at  points  lying  on  a  polar  lattice.  The  points  of  such  a  lattice  can  be  thought  of  as  the 
intersections  of  the  set  of  slices  with  a  family  of  evenly  spaced  concentric  circles, 
including  one  of  zero  radius  at  the  origin.  Once  the  transform  of  the  picture  is  known 
at  these  points,  the  next  step  is  to  approximate  the  transform  of  the  picture  over  the 
whole  plane  and  then  perform  an  inverse  Fourier  transform.  There  are  no  nice 
polar  "sampling  theorems"  that  will  allow  us  to  obtain  directly  the  set  f  yf)* 

As  a  different  approach,  let  us  therefore  sample  each  projection  at  its  own 
Nyquist  rate,  or  at  a  rate  proportional  to  its  Nyquist  rate,  then  alias  the  resulting 
sequences  to  N  points  (N  is  the  width  of  the  digitized  picture)  and  use  a  EFT  algo¬ 
rithm  to  get  samples  of  the  Fourier  transform  of  the  picture.  If  this  procedure  is 
followed,  the  Fourier  samples  which  result  lie  at  the  intersection  of  the  slices  with  a 
family  of  concentric  squares,  as  illustrated  in  Fig.  XIII -3. 

In  the  special  case  of  a  bandlimited  quasi -spacclimited  (BLQSL)  function,  a  concen¬ 
tric  squares  lattice  has  definite  advantages  over  a  polar  one.  Along  any  horizontal 
or  vertical  lines  in  the  Fourier  plane,  the  Fourier  transform  of  a  BLQSL  function  is  a 
one-dimensional  complex  polynomial  of  degree  N-l,  and  as  a  result  any  line  in  the 
Fourier  plane  is  completely  specified  by  N-l  samples  that  lie  along  that  lino. 
Furthermore,  a  BLQSL  function  is  completely  specified  by  its  EFT,  that  is,  by  the 
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1  iji.  XI1I-3,  A  si  t  of  samples  of  the  Kourier  transform  of  a  hand  limited 
function  obtained  by  sampling  eacli  projection  at  a  rate  pro¬ 
portional  to  its  own  Nyquist  rate. 


a  * 

y 

is B* 


X- POINTS  AVAILABLt  FROM 
PROJECTIONS 

•  -  OFT  POINTS 


Si  i a 

HTuiKTiuii  i 


I  iq.  Xlli>4.  Set  of  Fourier  plane  samples  by  which  an  8  X  H  picture  can 
lie  reconstructed  exactly,  under  the  assumption  that  the  pic¬ 
ture  is  handlimited  and  <|uasi-spaee| iniited. 
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samples  of  its  Fourier  transform  at  F^~  i,  -^j-j),  --^+  *  ^  ^esc  P°in*s 

on  the  sides  of  the  concentric  squares  or  their  extensions.  These  properties  enable 
us  to  reconstruct  a  BLQSL  function  exactly  from  a  set  of  N  concentric -squares  pro¬ 


jections. 

Suppose,  for  example,  we  have  a  two-dimensional  BLQSL  function  of  dimension  N. 

Let  us  assume  also  that  we  have  the 
capability  of  obtaining  the  projections 
of  the  picture  at  any  angle  we  desire. 

We  can  thus  take  N  projections  at  N  dis¬ 
tinct  angles  in  the  range  -45°  e  0  «  45° , 
and  we  can  also  take  one  projection  at 
an  angle  outside  this  range.  The  known 
points  in  Fourier  space  will  then  corre¬ 
spond  to  those  illustrated  in  Fig.  XIII-4 
for  the  special  case  N  =  8.  Along  each  ver¬ 


tical  square  side  we  thus  have  8  samples 
and  along  these  sides  the  Fourier  trans- 

iL 

form  is  a  7l  -order  polynomial  in  the 

-jljJ 

variable  e  y.  Then,  using  Lagrange 
polynomials  (or  some  other  technique), 
we  can  evaluate  the  Fourier  transform 
at  all  of  the  I)FT  points  on  each  of  the 


vertical  lines,  except  for  the  one  at  = 
0.  Now  consider  the  horizontal  sides. 
Along  each  of  these  lines  we  also  have 
a  polynomial  of  degree  7  and  we  also 
have  8  samples,  seven  computed  from 
the  column  calculations,  and  the  eighth 
provided  by  the  remaining  projection. 
Since  this  projection  was  taken  outside 
the  range  -45°  0  $  45° ,  it  must  inter¬ 

sect  all  of  the  horizontal  square  sides 


(and  must  also  not  pass  through  any  of 
the  I)FT  points  whose  value  is  already 


I  ip.  Mtt-  V 


known).  Thus  we  can  apply  Lagrange 


Comparison  of  reconstructions  from  a  con¬ 
centric  squares  grid  and  from  a  concentric 
circles  grid,  (a)  Original  picture,  (b)  Con¬ 
centric  squares  reconstruction,  (c)  Polar 
(concentric  circles)  reconstruction. 


polynomials  to  the  horizontal  lines  to 
fill  in  the  remaining  DFT  values.  Con¬ 
sequently,  we  know  all  of  the  DFT  values 
exactly,  and  a  BLQSL  picture  can  be 
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,  ,  .t„  nFT  so  that  we  know  the  set  of  picture  samples  exactly.  If  the 

reconstructed  from  its  DF  T  so  that  we  ^  second  round  Df 

last  projection  had  been  taken  perpendicuiar  to ^t y  .  would  be 

interpolation  would  not  have  been  necessary,  for  the  remaining 

,.  t  rv,  tVua  nFT  of  the  last  sampled  projection. 

aVai,‘ab,.ie  ‘xHl-5  we  show  a  concentric-squares  reconstruction  and  compare  it  with 
n  g"  •  rirclcs  (polar)  reconstruction.  Instead  of  using  Lagrange 

::: 

=r^r 

— •  -  -  - 

concentric -squares  reconstruction  is  truer  to  the  original. 

4.  Reconstructing  a  BLQSL  Picture  from  a  Single  Projection  ^ 

Let  us  now  restrict  ourselves  to  the  slice  at  an  angle  6  .  tan'  l/N. 
this  slice  can  be  written  as 


From  Eq.  7 


Nw 


w 


\a/  n2  +  1  ^ 


+  1 


N-l  N-l 

=  £  liWW)*** 

m=0  n=0 


-j 


TTU) 


J? 


if 


W(Nm+n) 


+  1 


N 


=  0 


otherwise 


(9) 


It  we  define  g(Nm+n)  =  f (^.  w)'  Eq'  9  b° 


becomes 


n2-i 

-?==)•  I 

W N“  +  1  V  N  +  1  / 


TTUll 


g(l)  exp  -i 


-Y 


=  o, 


if  M 


otherwise 


(10) 


,,  „*  n  -  tan-1  l/N  is  a  one-dimensional  poly- 
Thus,  over  the  region  of  interest,  the  she  ^  „„  \  a„d  thc  coefficients  of  that 


lomial  of  degree  X2-l  in  the  variable  cap  M 


r  i  ^  j 

polynomial  are  simply  the  picture  samples  arranged  as^  they  would^b  ^ 

were  scanned  column  by  column.  Since  1'  I  -  ’  -  ’ 


Vn2^ 
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2 

N  samples  taken  along  the  slice,  and  knowledge  of  a  polynomial  implied  knowledge 
of  its  coefficients,  specification  of  N2  samples  along  the  slice  at  6  =  tan-1  l/N  implies 
knowledge  of  the  whole  set  of  picture  samples.  (Similar  statements  can  be  made  about 
other  slices  in  the  Fourier  plane.) 

Let  us  now,  for  convenience,  define  G(w)  =  FI 


w 


A 


N2  +  1 


set  Aw  = 


/  2 

2WVN  +  1 
N3 


Then  if  we  compute  G(kAw)  for  k  =  -  +  1, 


and  let  us 


0  I  N- 

U'  1 2  ' 


2 

we  shall  have  N  equally  spaced  samples  of  G(w)  which  extend  over  the  entire  band. 

There  is  a  strong  reason  for  choosing  this  particular  set  of  frequency  samples  on  this 

slice.  If  the  projection  p  _j  j  (u)  is  sampled  at  its  Nyquist  rate,  if  the  infinite 

tan  jj 

sequence  that  results  is  then  aliased  to  give  a  sequence  of  length  N2,  and  if  this 
sequence  is  then  Fourier-transformed  by  means  of  the  DFT,  the  resulting  N2  point 
sequence  is  G(kAw).  Substituting  in  liq.  10,  we  have 


G(kAw) 


N2-l 

1=0 


g(l)  exp  [ -j 

V  N 


k  =  - 


N‘ 


+  i  £L 
T  *  •  ■  '  2  ' 


(11) 


Examining  Eq.  11,  we  see  that  G(kAw)  corresponds  to  the  first  N2  points  of  the  N3  point 
DT  1  of  the  sequence  formed  by  taking  the  N  picture  samples  column  by  column  and 
appending  N  -N2  zeros. 


The  sequence  G(kAw)  could  be  obtained  from  the  sequence  g(l)  by  means  of  a  chirp 
z-transform  algorithm  CZT.  To  obtain  g(l)  (the  picture  samples)  from  G(kAw),  we  thus 
need  an  inverse  CZT,  which  will  be  developed. 

These  results  have  an  interesting  interpretation  in  terms  of  another  problem.  The 


impulse  response  of  a  two-dimensional  nonrecursive  digital  filter  behaves  exactly  like 
the  set  of  rectangular  samples  of  a  BLQSL  picture  and  thus  the  impulse  response,  or 
the  two-dimensional  frequency  response,  of  such  a  filter  is  completely  specified  by  its 
frequency  response  along  the  line  0  =  tan  l/N.  As  well  as  providing  an  interesting 
property  for  such  filters,  this  result  suggests  a  mapping  between  one -dimensional  non¬ 
recursive  and  two-dimensional  nonrecursive  filter  designs  that  may  be  useful  in  filter 
design.  These  implications  are  worthy  of  further  study. 


5.  Reconstructing  a  Three-Dimensional  BLQSL  Function  from 
a  Single  Projection 


Probably  the  simplest  way  to  reconstruct  a  three-dimensional  sequence  from  its 
projections  is  to  consider  that  three-dimensional  sequence  as  a  stack  of  two-dimensional 
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•sequences.  If  we  think  of  these  two-tllim nslonal  sequences  as  lying  paralh I  to  the 
x-y  plane,  and  then  take  a  projection  parallel  to  the  x*y  plant  nt  an  angle  0  • 
tan-*  l/N  with  the  x  axis,  then  the  resulting  two-dimensional  projection  of  the  three- 
dimensional  object  will  be  a  slack  of  one -dimensional  projection  functions, each  of  which 
is  .he  projection  of  one  member  of  the  original  stack  of  tw  -dimensional  functions 
and  each  of  which  is  taken  at  Its  critical  ungle.  This  Is  a  straightforward  extension 
of  the  two-dimensional  problem  and  h.crdly  requires  elaboration.  It  would  Is  a  compu¬ 
tationally  efficient  scheme,  however.  If  a  complete  reconstruction  were  not  desired,  but 
only  a  limited  number  of  cross  sections. 

I’T’om  a  theoretical  |>oint  of  view,  u  mure  interesting  approach  to  the  three- 

dimensional  problem  Is  to  parallel  the  reasoning  of  the  two-dimensional  analysis.  In 

that  case  we  found  a  line  in  the  I'ourier  plane;  If  we  knew  the  I  ourler  transform  uf  the 

picture  along  this  line,  then  we  knew  the  whole  set  of  picture  samples.  Such  a  llm  also 

exists  In  the  three-dimensional  case,  'lids  is  that  line  which  Is  traced  out  by  the  vector 

w  ,  where 
c 

(  */.N4  t  N2  ♦  I  \/ N4  ♦  N2  ♦  I  \/n4  4  N2  «  1  ) 

Along  this  line  the  frequency  n  s|H»nse  is  a  polynomial  of  degree  N-l,  and  the  roelfl 
cients  of  this  |»dynoininl  are  the  function  samples  ®  m,  n*  I1  E  N-l, 

where  W  is  tin:  hand  width,  defined  as  in  the  two-dimensional  **ase.  If  wv  sample  this 
line  at  N^  evenly  spaced  points  over  tin  band,  then 


(i(kA^) 


N-l 

.  \ 

‘  L 

m  =  0 


N-l  N-l 
\  \ 
L  L 

n=0  p=0 


k  « 


.  0.  I, 


fl2» 


Thus  we  have  the  first  N*  |siints  of  an  N ^  |*oint  sequence.  liquation  12  can  In'  solved  l»y 
using  the  inverse  chirp  z- transform. 

The  projection  of  a  three-dimensional  function  Is  two-dimensional,  whereas  the 
critical  line  along  which  we  desire  the  frequency  response  Is  one-dlmenslonal.  This 
frequency  res|xjnsr  can  lie  evaluated  directly  from  tin  two-dimensional  projection 
samples  (the  projection  Is  a  liandlimlted  functhnil  or  equivalently  a  «oie- dimensional  pro¬ 
jection  of  the  two-dimensional  projection  can  lie  com|MJted  digitally  and  then  transformed 
If  the  angle  of  this  projection  is  chosen  properly,  this  slice  of  a  slice  will  corn' * 
spend  to  the  desired  line.  It  must  lie  remembered  however,  when  working  with  the 


Ol’K  No.  105 


17« 


(XIII.  SIGNAL  PROCESSING) 


‘  iV”  Xlll-6.  1  wo -dimensional  slice  of  the  Fourier  transform  of  a 
three-dimensional  function  taken  perpendicular  to  the 
plane  ^  =  Nuy  The  bandwidth*  of  the  slice  are  shown, 

as  well  as  the  location  of  the  critical  line,  whose  crit¬ 
ical  frequency  response  determines  the  whole  three- 
dimensional  frequency  response. 


two-dimensional  projection  that  although  this  is  a  bandlimited  function,  the  bandwidth 
in  the  two  orthogonal  frequency  variables  is  a  function  of  the  direction  of  that  projec¬ 
tion.  In  Fig.  Xlll-6  we  show  the  relevant  parameters  for  computing  the  frequency 
response  along  the  critical  line  when  the  original  projection  was  projected  onto  the  plane 


w  —  Noj 

x  y 


(>.  Inverse  Chirp  z- Transform 

I  he  chirp  /.-transform  (CZT)  algorithm7  is  an  efficient  algorithm  for  evaluating 
the  sum 


1-1 

X.  =  \  x(n)(A\Vk)n 

K  L-, 
n=  0 


where 


k  =  0,  1,  ....  K-l, 


(13) 


A  =  A^  exp(j2ir0  ) 

W  =  VV  cxp(j2wd>  ), 
o  •*  O' 

1  he  CZ  J  calculates  the  /.-transform  of  the  finite  duration 


sequence  x(n) 


at  a  set  of 


points  that  are  regularly  spa.-ed  on  a  spiral  in  the  z  plane  as  illustrated  in  Fig.  XIII-7.7 
liquation  11  can  be  seen  to  be  of  the  same  form  as  Kq.  13  if  in  place  of  the  sequence  x(n) 
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we  substitute  the  sequence  g(l)  =  g(Nm+n)  =  f(^L,  and  if  we  set  A  =  expf -j  +  l 
and  if  W  =  exp(j2ir/N  ).  The  sequence  Xk  and  A  and  W  arc  known  in  this  particular 
case,  and  we  desire  a  means  of  calculating  g(l).  What  we  need,  therefore,  is  a  means 
of  inverting  Eq.  13  -  an  inverse  CZT. 


1' ig.  XIII-7.  Illustration  of  the  independent  parameters  of  the 
CZT  algorithm  and  the  inverse  CZT  algorithm. 

(Modified  from  Rabiner  et  al.8) 

Since  the  sequence  corresponds  to  samples  of  a  polynomial  of  degree  L-l,  we 
know  that  Eq.  13  can  be  inverted  if  there  are  more  than  L  independent  values  of  X 
or  if  K  3  L.  This  follows  from  the  fact  that  the  matrix  of  coefficients  [(AWk)n]  is  a 
Vandermonde  matrix.  One  possible  technique  to  use  is  to  invert  (13)  directly.  For 

values  of  K  of  the  order  of  several  thousand,  however,  this  is  computationally  not 
feasible. 

Another  approach  which  proves  to  be  far  more  attractive  computationally,  although 
at  first  appearance  it  would  not  be  so,  is  to  use  the  Lagrange  polynomial  interpolation 
formula  to  reconstruct  the  complete  polynomial  over  the  whole  z  plane  from  the 
si't  of  K  samples  and  then  perform  an  inverse  z-transform  integral  of  this  poly¬ 
nomial  to  get  the  sequence  x(n). 

If  X ( z )  is  a  polynomial  of  degree  L-l  in  z'1  and  if  X(z)  is  specified  at  the  points 

>*  1  r.  4  J.1  . 
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L-l 

XM=  £  X(zm]  (14) 

m=0 

where 

z^1  =  [AWm  ]_1  m  =  0,  1 .  L-l 

and  fm(z-1)  is  a  Lagrange  interpolating  polynomial 


f  (z  )  = 
m 


(z“1-zo’1)(Z‘1"ZI1)  (z"1-zm-l)(z"1-zm+l)  •••  (Z_1-ZL-1 ) 

A  v 


(zm -zo^ )  (zm  _Z1 1 )  '  ’  '  (zm  ~zm- 1 )  (zm ”zm+ 1 )  ’  '  '  K’il) 
Since  the  denominator  of  f^(z  S  is  a  constant,  let  us  write  it  as  l/Cm>  Thus 


m 


X(z) 


X(z  )  C 
m  m 


L-l 

y 

^  .  /  -i  -i 
m=0  ( z  -z 

\  m 


) 


f=o  '  4  ' 


(15) 


Equation  15  represents  the  z-transform  of  the  sequence  x(n)  which  we  desire.  Thus 
we  see  that  the  sequence  x(n)  can  be  regarded  as  the  impulse  response  of  a  bank  of 
resonators  and  a  comb  filter  in  cascade,  as  in  Fig.  XIII-8. 


Fig.  XIII-8.  Digital  network  implementation  of  the  inverse  CZT. 
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Let  us  define  h(n)  to  be  the  impulse  response  of  the  bank  of  resonators.  From  Eq.  15 
we  can  write 

L-l 

h(n)  =  ^  -  C.X(z.)  z"+1  n  =  0,  1 . L-l. 

i=0 

If  we  recognize  that  z.  =  AW-1,  then 
L-l 

-  CjXtz.)  An+1W~i(n+1) 

n  =  0,  l,  .  .  .  ,  L-l.  (16) 

If  we  write 


him  =  V 
i=0 


L-l 

=  An+1  ^  -  C.X(z.)  W-i  W-in 
i=0 


N-l 

CZT(x(n),  A,  W,  N)  =  ^  x(n)  A-nWnk,  (17) 

n=0 

then  we  ean  write  (16)  in  the  form 

h(n)  =  -An+ICZT(CnX(zn),  W,  W_I,  L)  (I8) 

and  thus  h(n)  can  be  evaluated  efficiently  by  using  the  CZT  algorithm  itself. 

The  output  sequence  x(n)  is  then 

x(n)  =  h(n)  @  m(n), 

where  (*)  denotes  convolution.  Inasmuch  as  we  only  care  about  the  first  L  values  of  the 
sequence  x(n)and  m(n)is  a  causal  sequence  of  length  L+  i ,  only  the  first  L  values  of  the 
sequence  h(n)are  necessary.  This  fact  allows  us  to  evaluate  h(n)using  a  CZT,  and  will 
further  allow  us  to  perform  the  convolution  of(18)using  high-speed  convolution  techniques. 

Except  for  calculating  the  arrays  C’k  and  m(n),  the  computation  of  the  inverse  CZT 
can  all  be  done  efficiently.  In  fact,  the  time  required  to  calculate  an  inverse  CZT 
is  approximately  twice  that  required  to  calculate  a  CZT,  and  thus  is  roughly  pro¬ 
portional  to  2L  log2  2 L  if  L  is  a  power  of  two.  To  the  best  of  my  knowledge,  there 
are  no  particularly  convenient  methods  for  calculating  Cfc  and  m(n).  These  quantities 
do  not  depend  upon  the  sequence  Xk>  bu  only  on  the  location  of  the  samples  of  the 
z -transform  in  the  z  plane,  and  therefore  trey  will  be  the  same  for  all  reconstructions 
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of  a  given  size,  which  will  allow  these  arrays  to  be  precomputed  and  stored.  In  this 
sense,  the  calculation  of  these  quantities  can  be  overlooked  when  talking  about  compu¬ 
tation  times.  To  reconstruct  a  32  X  32  array  from  a  single  projection  requires  approx¬ 
imately  10^  operations  (complex  multiplies  and  adds)  if  the  calculation  of  these  initial 
arrays  is  overlooked,  and  it  requires  approximately  5000  complex  storage  locations. 

To  solve  Eq.  13  by  direct  inversion  would  require  approximately  10^  operations  and 
roughly  10^  complex  storage  locations. 

A  one -projection  reconstruction  algorithm  is  now  being  implemented. 

R.  M.  Mersereau 
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B.  TRANSIENT  RESPONSE  OF  A  VARACTOR-CONTROLLED 
OSCILLATOR 

A  study  has  been  made  of  the  Q-related  effects  on  the  transient  response  of  a 
voltage -controlled  negative-resistance  oscillator.1  The  equation  governing  the  nonlinear 
oscillations  of  a  second-order  time-invariant  circuit  has  the  form 


2 


GO  X 


o 


in  which  f^x,  is  a  general  nonlinear  function  of  the  variable  x  and  its  derivative, 
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and  6  is  proportional  lo  the  reciprocal  of  the  effective  Q  of  the  circuit.  A  perturbational 
analysis  of  this  equation  yields  solution  in  which  the  instantaneous  frequency  of  oscil¬ 
lation  u!|  is  given  by  an  expression  of  the  form  u>j  =  u>o  +  !■  (n).  where  a  corresponds 
to  the  magnitude  of  tin?  amplitude  envelope  of  the  oscillations.  This  expression  indicates 
a  possible  variation  in  frequency  because  of  a  variation  in  the  amplitude  envelope  during 
a  transient  period  of  tin*  oscillations.  It  can  be  argued  that  the  frequency  does  not  reach 
a  steady-state  value  until  the  amplitude  reaches  a  steady-state  value. 

1.  Cause  and  Modi?  of  Transient  Operation 

An  analysis  of  an  idealized  step-change  in  one  of  the  frequency-determining  elements 
indicates  that  such  a  change  could  cause  a  disturbance  from  the  equilibrium  steady-state 
oscillation.  Such  a  disturbance  implies  that  the  state  of  the  oscillation,  specified 
in  the  phase  plane  by  x  and  dx/dt  immediately  after  the  change  occurs,  does  not 
correspond  in  general  to  a  state  that  is  located  on  the  steady-state  limit  cycle.  Standard 
phase-plane  analysis  shows  that  if  the  state  of  the  oscillator  is  described  by  a  set  of 
coordinates  (the  operating  point)  not  located  on  the  limit  cycle,  the  oscillation  will 
spiral  to  the  stable  limit  cycle.  This  spiraling  to  the  limit  cycle  corresponds  to  a 
variation  in  the  amplitude  envelope.  The  nonlinear  mechanism  that  determines  the 
steady-state  limit  cycle  operation  also  controls  this  transient  response  back  lo  the 
steady  state. 

2.  Specific  Case  —  Van  dor  Pol  Negative- Resistance  Oscillator 

A  specific  case  of  an  oscillator  with  a  Van  der  Pol  type  of  nonlinearity  was  studied. 


r(x*  of) 8  (|-x“,nf ' 

l  lie  analytical  ho.ution  to  a  second-order  approximation  indicated  that  the  parameter 
has  a  strong  influence  on  the  transient  response,  a  first-order  effect  on  the 
rate  of  amplitude  variation,  and  a  second-order  effect  on  the  frequency.  These  relations 
take  the  form  da/dt  =  6H(u).  and  ^  +  62K(a).  where  the  11(a)  and  K(a)  are  spe¬ 

cific  functions  if  a,  corresponding  to  the  solution  of  the  Van  der  Pol  equation. 

It  was  noted  that  the  lower  tin*  Q  or  the  circuit,  the  faster  the  response  back  to  the 
steady  state  following  some  disturbance,  but  at  the  expense  of  frequency  stability  a:ul 
noise  reduction  properties  of  the  oscillator  in  the  steady  state. 


3.  Lower  Limit  of  Transient  Response  Time 

S | K  i  t fio  consideration  of  a  varactor-controlled  oscillator  established  a  lower  limit 
fo,  th,  transient  response  time,  l  irst.  consider  the  case  of  a  circuit  with  an  infinite  CJ. 
the  harmonic  oscillator,  governed  by  the  second-order  differential  equation 
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occurs  during  a  synthesis  run  is  reported  with  the  block  identifier  showing  where  it 
happened.  If  this  occurs,  signal  levels  in  the  vicinity  of  the  offending  block  should  be 
reduced  by  modifying  the  configuration  and/or  inputs  to  a  level  just  under  that  which 
causes  overflow.  The  only  synthesizer  processing  blocks  in  which  overflow  can  occur 
are  addition-type  blocks  and  filters. 

One  can  easily  envision  achieving  real-time  synthesis  by  doing  the  final  signal  pro¬ 
cessing  in  hardware,  either  analog  (for  example,  controlling  Moog  or  Buchla  modules) 
or  digital  (which  could  be  designed  to  be  much  more  flexible  and  would  be  inherently 
more  stable).  The  cost  of  digital  hardware  continues  to  decrease  rapidly,  which  sug¬ 
gests  that  a  digital  hardware  synthesizer  will  soon  be  economically  feasible.  At  the 
present  time  wc  are  making  a  modest  effort  to  design  and  construct  such  digital  hard¬ 
ware.  With  a  real-time  synthesizer  additional  real-time  (at  "performance"  time)  con¬ 
trol  inputs  would  become  possible.  The  distinction  between  notation  inputs  and  real-time 
inputs  would  be  somewhat  analogous  to  the  distinction  between  a  composer  and  a 
performer  or  conductor. 

Richard  E.  Albright's  thesis  research^  contributed  significantly  to  the  evolution  of 
MITSYN.  Many  conversations  and  work  sessions  with  Robert  P.  Ceely,  a  composer 
who  has  been  MITSYN's  most  demanding  user,^  have  also  stimulated  the  work. 

W.  L.  Henke 
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C.  SPEECH  ANALYSIS  BY  LINEAR  PREDICTION 
1 .  Introduction 

This  report  describes  the  development  of  a  speech  analysis  system  based  on  linear 
prediction  of  the  speech  wave.  The  analysis  is  achieved  by  representing  the  speech  wave 
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in  terms  of  a  set  of  parameters  closely  related  to  the  glottal  excitation  function  and  the 
vocal-tract  transfer  function. 

The  system  has  been  implemented  by  utilizing  the  computer  facilities  of  Group  24 
at  Lincoln  Laboratory,  M.I.T.  These  facilities  include  the  Univuc  1219  computer,  which 
is  a  medium-sized  general-purpose  computer;  the  Fast  Digital  Processor,  which  is  a 
fast  programmable  signal  processor  attached  to  the  Univac  1219;  and  peripheries,  such 
as  A/D  and  D/A  converters  and  various  display  facilities.  The  system  is  capable  of 
performing  real-time  spectrum  analysis  when  both  spectral  crrss-section  and  spectro- 
graphic  displays  arc  possib?e.  Effort  is  now  directed  toward  evaluation  of  its  perform¬ 
ance  in  extracting  such  acoustic  parameters  as  formants  and  fundamental  frequency  of 
voicing.  An  initial  attempt  at  formant  tracking  on  spectra  derived  from  linear  predic¬ 
tion  has  given  promising  results. 

We  shall  review  briefly  the  theory  of  linear  prediction,  describe  the  implemented 
system,  and  give  some  preliminary  results  of  speech  analysis  using  this  system. 

2.  Theory 

Detailed  treatments  of  the  theory  of  linear  prediction  and  its  variations  have  been 
1-4 

reported.  Our  analysis  is  based  on  the  speech- production  model  shown  in  Fig.  IX-6. 
The  all-pole  digital  filter  H(z)  represents  the  combined  effect  of  the  glottal  source,  the 


H4 


Fig.  IX-6.  Model  of  speech  production. 

» 

vocal  tract,  and  radiatio  losses.  In  this  idealized  model  the  filter  is  excited  either 
by  a  periodic  impulse  train  for  voiced  speech  or  random  noise  for  unvoiced  speech. 

The  speech  production  model  can  be  equivalently  characterized  by  the  difference 
equation 

P 

s(n)  =  £  a  s(n-k)  +  x(n),  n\ 

k=  1  K 

t  h 

where  s(n)  and  x(n)  are  the  n  samples  of  the  output  speech  wave  and  the  input,  respec¬ 
tively.  The  ak's  are  the  coefficients  characterizing  the  filter  H(z),  and  henceforth  will 
be  referred  to  as  the  predictive  coefficients. 
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From  Kc|.  I  it  is  clear  that  we  can  determine  the  a^B  If  we  know  the  I»|»ut  anti  2|t 
consecutive  value b  of  s(n).  The  first  |i  of  thette  vjiIiicb  Berven  iw  initial  conditions .  N\c 
shall  restrict  ourselves  here  to  voiced  speech  in  which  the  input  is  a  pcr*ndic  impulse 
train.  In  this  case  the  nk's  can  be  determined  with  knowledge  of  only  2p  consecutive 
values  of  s(n)  anti  the  position  of  the  impulses.  For  this  idealized  model,  we  can  define 
the  predicted  value  of  s(n)  as 

!» 

s(n)  =  1  a.  s(n-k).  W 

k-  I  K 

The  difference  between  s(n)  anti  s(n)  will  be  zero  exce|it  for  one  sample  at  the  beginning 
of  each  period. 

In  reality,  however.  s(n)  is  not  produced  by  this  highly  Idealized  model  and  therefore 
prediction  of  s(n)  based  on  Ftp  2  will  introduce  error.  If  we  are  to  approximate  s(n) 
bv  s(n)  as  defined  by  Ftp  2.  the  ak’s  can  only  be  determined  with  the  specification  of 
an  error  criterion. 

We  can  choose  to  determine  the  .t’s  hy  minimizing  the  mean-stpmred  difference 
between  s(n)  and  s(n),  that  is.  Ity  minimizing 


i-: 


-  ls(n)-s(n)J“. 
n=0 


(3> 


Note  that  the  sipiared  difference  is  summed  over  all  samples  except  one  at  the  beginning 


of  each  period,  and  we  have  assumed  that  the  minimization  is  to  be  carried  out  over  a 
section  or  s(n)  of  length  N.  It  is  also  im|*.rtant  to  note  that  p  more  values  of  s(n)  are 
nei  led  for  proper  boundary  conditions. 

The  minimum  mean- squared  error  criterion  is  chosen  instead  of  other  error  cri¬ 
teria  because  the  det  rmination  of  the  ak’s  now  reduces  to  the  solution  of  the  following 


set  of  linear  equations. 


I' 

\* 

k=  I 


a.  o.,  = 
h  Jk  jo 


j  =  I.  2,  3 . p. 


H) 


where 


o  =  1  s(n-j)  s(u-k)  k  =  0.  1, 2,  . . . ,  p.  (M 

Jk  n=0 


N'<iU  tiiat  the*  sum  in  lap  S  excludes  one  (mint  at  the  beginning  of  each  p<  rhul. 
hapiation  4  can  be  written  in  matrix  form  as 


MS 
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♦a  ■  4i, 


(6) 


where  ♦  la  a  p  X  p  matrix  with  typical  elemert  ♦  ;  a  and  4-  arj  p-dimensional  vectors 
w.th  the  J  component  given  by  Uj  and  *Jo.  respectively.  The  solution  of  this  matrix 
equation  is  greatly  simplified  by  the  fact  tiiat  the  matrix  is  symmetric  and  hence  recur- 
sive  procedures  arc  applicable. 

It  is  of  interest  to  compare  the  analysis  procedure  outlined  ubove  for  two  different 
cases.  If  the  fundamental  frequency  of  voicing  is  known  in  advance,  the  analysis  can 
be  carried  out  directly,  in  the  sense  that  Eq.  5  can  be  evaluated  exactly.  In  practice, 
however,  it  is  highly  desirable  to  carry  out  the  analysis  without  a  priori  knowledge  of 
pitch.  In  this  case  an  approximation  has  to  be  made  and  additional  error  is  Introduced. 
We  shall  illustrate  this  point  by  a  simple  example,  but  the  argument  can  easily  be 
generalized  to  include  more  complicated  situations. 

Let  us  assume  that  there  is  only  one  pitch  pulse  in  the  data  nnd  It  occurs  at  n  *  m. 

If  m  is  known,  then  Eq.  5  can  be  evaluated  as 


N-l 

♦llt  ■  -  s(n-J)  s(n-k). 

J  n»0 
nSm 


(7) 


Equation  5  can  not  Ik-  evaluated  explicitly,  however,  if  m  is  unknown. 

Ix?t  us  now  approximate  *  by 

JK 

A  N-l 

*Jk  "  “  #(n-k). 

J  n»0 


(8) 


Comparing  Kqs.  7  and  8.  wo  find  that  the  error  in  ♦  is  given  by 

JK 

"  ♦jk  "  •jk  *  (9) 

Hy  the  nature  of  the  s|K-ech  wave.  s(m-J)  and  s(m-k)  are  small  compared  with  samples 
at  the  beginning  of  each  period.  Therefore  the  error  <  „  is  small  com  (sired  with  for 
any  reasonable  X  Results  of  comparing  the  two  analysis  procedures  will  be  presented. 

*Ibe  theory  of  linear  prediction  has  also  been  formulated  in  a  slightly  different 
way  '  Let  c  (r.)  denote  the  output  of  the  inverse  filter  if ’(z)  when  it  is  excited 
by  s(n)  If  we  cboose  to  determine  the  n^fi  by  minimizing  the  total  energy  in  e(n). 
the  set  of  equations  obtained  can  be  sU.wn  to  Ik-  almost  identical  to  Kqs  4  and  5 
The  only  difference  between  the  iwu  formulations  is  that,  since  e(n)  is  of  length 
Nf  I’1  u,r  matrix  ♦  in  the  second  formulation  is  of  Toeplitz  form. 

*Jk  *  *IJ-k!,  0’  (10) 


!3(. 
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Although  the  theory  developed  thus  far  is  for  voieed  speeeh,  we  have  used  the  same 
procedure  to  determine  the  predietive  coefficients  for  unvoieed  speech. 

3.  Speeeh  System 

Figure  IX-7  is  a  bloek  diagram  of  the  analysis  system.  At  present,  only  that  part 
of  the  system  enclosed  in  the  dashed  lines  has  been  implemented.  Input  data  are  first 
p.-e -emphasized  (10  dB/octave),  bandlimited  to  5  kHz,  and  sampled  at  10  kHz.  The  eom 
putation  of  the  $jk-  as  defined  by  Eq.  8,  ean  be  greatly  redueed  by  noting  that 

Vl.k+1  =  *jk+  SH~i}  S("H)  "  3<N'1'i)  S(N_1"j)-  (H) 

Therefore  only  $.  for  j  =  0,  1,  2 . p  need  be  computed  directly.  These  are  the  first 

p+1  |X)ints  of  the  short-time  autocorrelation  function  of  s(n).  The  rest  of  the  matrix 
elements  are  obtained  recursively  from  Eq.  11.  The  last  two  terms  on  the  right-hand 
side  of  Eq.  1 1  ean  vanish  to  result  in  a  Toeplitz  matrix,  depending  on  how  the  problem 

is  formulated.  After  the  elements  of  the  matrix  are  formed,  Eq.  6  is  solved  by  the 

method  of  square -rooting. 


Fig.  EX-7.  Analysis  system. 


From  the  predictive  coefficients,  the  approximated  spectral  envelope  of  s(n)  earthen 
be  computed  as  |  H(ejw)|.  Note  that  the  unit-sample  response  of  the  inverse  filter  H  (z) 

is  given  by 


1 


for  n  =  0 


h(»)  =  < 


a 


n 


for  n  =  1,2,3 . p 


^  0  otherwise 
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Therefore  |  Hfe^)!  can  be  obtained  efficiently  by  computing  the  discrete  Fourier  trans¬ 
form  of  h(n)  with  a  fast  Fourier  transform  algorithm,  and  then  inverting  the  result.  Each 
spectral  cross  section  is  multiplied  by  the  rms  value  of  the  input  data  to  provide  gain 

normalization. 

Both  the  input  data  length  N  and  the  order  of  the  filter  p  are  variables;  the  choice 
of  these  variables  has  been  discussed  elsewhere.1,  3  Unless  otherwise  specified,  all 
results  presented  are  obtained  with  n  =  256  and  p  =  12.  The  coefficients  are  recomputed 

every  6.  4  ms. 

4.  Preliminary  Results 

In  Fig.  IX- 8  spectra  of  a  synthetic  vowel  /a/  obtained  by  using  various  techniques 
are  compared:  (a)  and  (b)  by  windowing  (with  different  window  widths)  and  Founer- 
transforming  the  waveform,  (c)  by  cepstral  smoothing,6  and  (d)  by  linear  prediction. 

In  Fig.  IX-8a  the  effect  of  glottal  periodicities  can  be  seen  as  the  ripples  superimposed 
on  the  spectral  envelope.  These  ripples  are  greatly  reduced  in  Fig.  IX-8b  because  of 
spectral  smearing  of  the  wider  frequency  window.  In  Fig.  IX- 8c  the  effect  of  glottal 
periodicities  is  removed  by  a  homomorphic  technique.  This  effect  is  also  removed  in 
Fig.  IX- 8d.  But,  since  the  analysis  is  based  on  a  specific  .odel  and  thus  limits  the 
number  of  spectral  peaks,  there  are  no  extraneous  peaks  in  Fig.  IX-8d.  If  we  compare 
the  locations  of  the  spectral  peaks  with  the  actual  values  of  the  five  formants,  it  is  clear 
that,  for  this  example,  the  spectrum  derived  from  linear  prediction  provides  accurate 
formant  locations. 

Figure  IX- 9  shows  the  spectrum  of  the  same  vowel  obtained  by  linear  prediction, 
except  that  in  this  case  the  analysis  is  carried  out  pitch- synchronously.  Comparing 
Figs.  IX- 8d  and  IX-9,  except  for  the  bandwidth  of  the  second  spectral  peak,  we  find  that 
the  qualitative  difference  between  the  two  spectra  is  quite  small. 

It  should  be  noted  that  wc  have  chosen  to  use  a  lot  of  synthetic  speech  material  in 
our  study.  This  is  because  parameters  of  synthetic  speech  are  known  exactly.  There¬ 
fore  the  use  of  synthetic  speech  can  provide  us  with  a  more  objective  evaluation  of  the 

analysis  system. 

Figure  IX- 10  is  a  spectrographic  display  of  a  sentence  generated  from  a  synthesis- 
by-rule  program  developed  by  D.  H.  Klatt.  Some  observations  can  be  made  concerinig 
Fig.  IX- 10.  First  of  all,  the  smooth  and  continuous  formant  trajectories  are  clearly 
visible  for  all  non- nasal  sonorants.  Second,  the  analysis  is  able  to  separate  closely 
spaced  formants  very  well,  as  in  the  case  of  /r/.  The  analysis  also  worked  well  for 
fricatives,  nasals,  and  stops,  in  the  sense  that  spectra  obtained  during  irication, 
nasalization,  and  aspiration  contain  the  important  features  characterizing  these  pho¬ 
nemes.  For  example,  spectra  derived  from  linear  prediction  for  nasals  all  have  a 
low-frcqucncy  peak,  followed  by  a  relative  absence  of  energy  in  the  500  ~  1500  Hz 
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region.  These  are  some  of  the  important  spectral  attributes  of  the  nasals. 

Figure  IX- 1 1  is  a  spectrographic  display  of  a  sentence  spoken  by  a  male  subject  and 
has  features  similar  to  those  discussed  above. 


o  i  2  3 

TIME  ( s) 

Fig.  IX- 10.  Spectrographic  d;  splay  of  the  sentence  "This  program  synthesizes 


a  i 

TIME  (*) 

Fig.  IX- II.  Spectrographic  display  of  the  sentence  "Can  you 
be  more  specific?"  spoken  by  a  mule  subject. 

From  Figs.  IX-  10  and  IX-  II  it  is  clear  that  during  the  voiced  |x>rtion  of  speech  the 
formants  arc  sharply  defined  and  their  trajectories  arc  smooth  and  continuous.  It  is 
therefore  reasonable  to  expect  formant  tracking  by  a  simple  peak-picking  algorithm  to 
give  good  results.  Although  results  of  this  are  not  included  hi  this  report  because  a 
voiced-unvoiced  decision  has  not  yet  been  implemented,  formant  tracking  by  a  simple 
peak-picking  algorithm  worked  well  in  a  few  examples  that  were  tried. 

The  system  provides  highly  interactive  analysis  and  display  and  is  capable  of  real¬ 
time  processing.  Figure  IX- 12  is  another  example  to  illustrate  the  highly  interactive 
display  capabilities  of  this  system.  The  sentence  is  spoken  by  a  male  subject.  By 
setting  the  pointer  to  a  specific  instant  of  time  on  the  spectrographic  display,  we  can 
display  and  examine  the  next  twelve  cross  sections  on  the  other  oscilloscope. 
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(b) 

Fig.  IX-  12.  (a)  Speetrographic  display  of  the  phrase  "Digital  signal 

processing"  spoken  by  a  male  subject;  (b)  12  cross 
sections  starting  from  the  pointer  in  (a). 


5.  Summary 

We  have  partially  implemented  a  speech- analysis  system  based  on  linear  prediction 
of  the  speech  wave.  The  analysis  technique  differs  from  all  other  techniques,  in  that 
it  is  closely  tied  to  a  speech-production  model.  Our  limited  experience  with  the  system 
indicates  that  it  is  well  suited  to  spectrum  analysis  and  is  potentially  very  useful  for 
formant  tracking.  The  voieed-unvoieed  decision  and  fundamental-frequency  extraction 
parts  of  the  system  are  now  being  implemented. 
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The  fact  that  the  analysis  is  based  on  a  specific  speech  production  model  also 
imposes  limitations  cn  the  technique.  It  is  well  known  that  during  the  production  of 
nasals  and  fricatives  there  exist  zeros  as  well  as  poles  in  the  vocal-tract  transfer 
function.  It  can  be  argued  that  we  can  always  approximate  these  zeros  by  multiple  poles 
and  that  the  important  features  characterizing  these  phonemes  are  generally  contained 
in  the  overall  shape,  not  in  the  specific  pole- zero  locations,  of  the  spectrum.  There 
are  other  unsettled  issues,  such  as  whether  the  input  speech  should  be  windowed,  which 
of  the  two  formulations  should  be  chosen  for  actual  implementation,  and  so  forth.  We 
are  now  evaluating  the  system  with  synthetic- speech  material,  with  all  parameters  such 
as  formants  and  bandwidths  known  exactly.  We  believe  that  this  evaluation,  together 
with  speech  synthesis  based  on  linear  prediction,  will  help  us  resolve  some  of  these 
issues. 

We  hope  that  this  system  can  serve  as  the  acoustic  parameter  extraction  stage  of 
a  speech-recognition  system.  Although  it  is  premature  to  speculate  on  its  performance 
for  acoustic  parameter  extraction,  the  highly  interactive  analysis  and  display  facilities 
now  developed  have  proved  to  be  useful  in  studying  the  spectral  characteristics  of  pho¬ 
nemes  and  the  spectral  changes  from  coarticulations. 

Programming  consultation  with  Mrs.  Stephanie  McCandless  is  gratefully  acknowl¬ 
edged. 

V.  W.  Zue 
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B.  SOME  CONSIDERATIONS  IN  THE  USE  OF  LINEAR 
PREDICTION  FOR  SPEECH  ANALYSIS 

1 .  Introduction 

Recently,  the  analysis  of  speech  by  means  of  a  technique  referred  to  as  linear  pre¬ 
diction,  predictive  coding,  or  least-squares  inverse  filtering  has  received  considerable 
attention.  This  technique  is  directed  toward  modeling  a  sequence  as  the  output 
of  an  all -pole  digital  filter.  When  the  .sequence  to  be  modeled  is  specified  over  the 
domain  of  all  integers  n,  there  is  a  well-defined  formulation  of  the  technique.  When 
only  a  segment  of  the  sequence  is  available,  however,  which  is  always  the  case  in  prac¬ 
tice,  there  are  several  formulations  of  the  technique  that  are  closely  related  but  have 
important  differences.  One  objective  of  this  report  is  to  summarize  these  differ¬ 
ences  and  their  implications. 

When  the  sequence  of  data  to  be  modeled  is  of  finite  length  and,  over  the  interval  for 
which  it  is  specified,  corresponds  exactly  to  the  unit-sample  response  of  an  all-pole 
filter,  the  parameters  of  the  model  obtained  by  using  linear  prediction  may  be  nonunique. 
If  the  data  correspond  closely,  but  not  exactly,  to  the  unit-sample  response  of  an  all¬ 
pole  filter,  then  the  solution  will  be  unique,  but  the  unit-sample  response  of  the  resulting 
filter  may  be  considerably  different  from  th'.*  date  and  small  changes  in  the  data  will 
result  in  large  changes  in  the  parameters  of  the  model  and  its  unit -sample  response. 

A  second  objective  of  this  report  is  to  discuss  this  property  of  the  technique. 

2.  Formulation  of  the  Linear -Prediction  Problem 

We  shall  consider  the  formulation  of  the  technique  for  two  problems.  In  problem  A 
the  data  are  specified  for  all  n,  and  in  problem  B  only  a  finite  segment  of  the  data  is 
available. 

a.  Problem  A 

Consider  a  sequence  s(n)  defined  for  all  n  and  for  which  s(n)  ■  0  for  n  <  0.  We  seek 
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an  all-pole  filter  with  transfer  function  S(^  )  ■ 


such  that  its  unit -sample 


,-k 


1  "  1  ak/ 
k*l 

A  A  A 

response  s(n)  approximates  s(n).  From  the  form  of  S (f),  s(n)  for  n  >  0  is  given  by 


s(n) 


I 

k=  1 


aks(n-k). 


(1) 


In  the  linear-prediction  technique  we  define  a  predicted  value  of  s(n),  denoted  by  s^n),  as 


~  \ 

s(n)  =  '  aks(n-k) 

k=l 


(2) 


and  choose  the  parameters  ak  to  minimize  the  error  £  defined  as 


or 


n=l 


2 


s 


I 


s(n) 


I 


aks(n-k) 


2 


(3) 


We  note  that  the  sum  on  n  excludes  the  origin  because  s(o)  depends  only  on  aQ  and  can¬ 
not  affect  the  result  of  the  minimization. 

By  setting  d£/da^  to  zero  f  ■  1,2 . p,  we  arrive  at  the  following  set  of  linear 

equations: 


1 


s(n-j)  s(n-k) 


or 


I 

n  =  1 


s(n)  s(n-j) 


In  matrix  notation. 


j  -  1.2,.  ...p. 


(4) 


<t>  a  =  4<, 


where 


n=l 


s(n-i)  s(n-j). 


(5) 


(6) 
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*j  3  ♦ 


oj‘ 


It  can  be  shown  that  the  matrix  is  symmetric  and  Toeplitz;  that  is, 


and 


*1+1, j+l  =  *ij* 

Therefore  the  solution  of  Eq.  5  is  computationally  straightforward  and  efficient.6 
We  shall  examine  some  of  the  properties  of  this  solution.  q 

i.  If  s(n)  is  indeed  the  unit-sample  response  of  an  alt-pole  filter  s(n)  =  E  b  s(n-k), 

k=)  k 

then  with  q  <  p  the  procedure  leads  to  the  unique  solution  «k  =  *>k  for  k  *  1, 2, ....  q, 
and  ak  =  0  for  k  >  q. 

ii.  It  can  be  shown  that  the  solution  S(^ )  corresponds  to  a  stable  filter.7 

iii.  It  can  be  shown  that  the  error  &  is  monotonic  with  p. 

iv.  Let  us  define  the  autocorrelation  function  of  s(n)  as 


*lj  *  y 


i. -0 


(7) 


and  the  autocorrelation  function  of  s(n)  as 

(/* 

*lj  ^  «<n) 


n=0 


i)  s(n-j). 


.  .  .  A  A  A 

It  can  be  shown  that  It.  and  It.  are  related  by  It.  =  lit  /it  lit  for  i  =  1  2 

J  J  j  1  o  o'  j  j  • 

v.  Minimizing  Kq.  3  Is  equivalent  to  minimizing 


»•••»! 


S(c>) 

S(ejw) 


-  1 


dui, 


(8) 


(9) 


where 


A 


a 


o 


P 

-  E 
k*l 


..-k 


kj 
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Therefore,  the  error  criterion  can  be  interpreted  in  a  slightly  different  manner.  Let 
u(n)  denote  the  output  of  the  filter  S-1(^)  when  it  is  excited  by  s(n).  The  linear- 
prediction  technique  then  corresponds  to  determining  the  {ak)  such  that  u(n)  is  best 
approximated  by  a  unit  sample  at  n  =  0. 

From  Eq.  9,  we  see  that  the  error  is  dependent  on  the  ratio  of  S(e;’w)  vs  Sfe^).  It 
is  also^clear  that  the  minimization  depends  on  both  the  magnitude  and  phase  of  S(e^w). 
Since  SieJ“)  can  be  shown  to  have  minimum  phase  (that  is,  all  poles  and  zeros  are 
inside  the  unit  circle),  this  procedure  will  work  best  when  Sfe^)  is  also  minimum- 
phase.  Heuristic  ally,  we  can  argue  this  in  the  following  way:  Since  S(^)  is  stable,  we 
shall  concern  ourselves  only  with  the  zeros  of  S(^).  If  S(^ )  has  zeros  inside  the 
unit  circle,  each  of  these  zeros  (excluding  those  at  the  origin)  can  be  approximated 
by  multiple  poles  by  Taylor  series  expansion,  and  the  approximation  will  improve  as  we 
increase  p,  the  order  of  S(j?).  This  suggests  that  if  s(n)  is  minimum -phase,  the  error 
asymptotically  goes  to  zero  as  p  increases.  This  is  no  longer  true,  however,  if  S(  «  ) 
is  not  minimum-phase.  Consequently  it  would  be  expected  that  if  S (y)  is  not  minimum- 
phase,  the  error  will  not  asymptotically  approach  zero  as  p  increases. 

b.  Problem  B 

In  this  case  we  consider  a  finite  segment  of  data  of  length  N  which  we  wish  to  model 

as  the  output  r  r  an  all-pole  filter.  Typically,  this  problem  has  been  formulated  in  two 
ways. 

Formulation  I 

The  data  are  multiplied  by  a  window  w(n)  and  the  N  data  points  are  numbered  from 
n  =  0  to  n  =  (N-l).  The  window  is  of  duration  N  so  that  multiplication  of  the  data  by  the 
window  results  in  a  sequence  s'(n)  which  is  zero  for  n<0  and  n>(N-l),  The  sequence  s(n) 
in  problem  A  is  then  taken  as  the  sequence  s'(n).  In  this  case  most  of  the  results  of 
problem  A  remain  unchanged,  although  the  sum  over  n  is  now  finite.  The  matrix  is 
again  Toeplitz  and  the  set  of  equations  can  be  solved  efficiently. 

This  formulation  is  sometimes  referred  to  as  the  autocorrelation  method,  since  the 
matrix  *  is  an  autocorrelation  matrix  of  s(n),  as  in  problem  A.  Empirically,  it  has 
been  found  that  for  small  N  it  is  necessary  to  multiply  s(n)  by  a  smooth  window  rather 
than  simply  to  truncate  it ,  in  order  to  minimize  the  end  effects.3,  4 

Formulation  II 

No  assumption  is  made  about  the  data  outside  the  interval  on  which  they  are  given. 
Specifically,  the  first  p  values  of  the  data  are  taken  as  initial  conditions  and  it  is 
assumed  that  with  n  =  0  denoting  the  beginning  of  the  interval  on  which  the  data  are  given, 
the  input  to  the  all -pole  filter  is  zero  for  p  ^  n  <  N.  The  error  g  is  defined  as 
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N-l 


•l 


n=p 


s(n)  -  Y  aks(n-k) 
k=l 


8  g 


where  s(n)  denotes  the  data.  To  minimize  the  error,  we  set  r —  =  0  for  i  =  1,2, 

0a.  J 

,  .  J 

and  arrive  at  the  set  of  equations 


N-l 


N-l 


Y  ak  ^  s(n-k)  s(n-j)  =  ^  s(n)  s{n-j) 
k=l  n=p  n=p 

or,  in  matrix  notation,  £  a  =  ip,  where 

N-l 

4>jj  =  ^  s(n-i)  s(n-j)  i  =  0,l,...,p 
n=p 


j  —  1 1  2,  .  . .  ,  p 


and 


'J'j  -  4*^  j  -  1 1  2,  . , . ,  p. 

This  formulation  is  sometimes  referred  to  as  the  covariance  method.  The  resulting 
matrix  $  is  still  symmetric  but  no  longer  Toeplitz.  In  fact, 

N-l  N-2 

♦i+l.j+l  =  Y  s(n-i-l)  s(n-j-l)  =  y  s(n-i)  s(n-j) 

n=p  , 

n=p-l 

=  4>.j  -  s(N-l-i)  s(N-l-j)  +  s(p-l-i)  s(p-l-j).  ^ 

The  last  terms  in  Eq.  ’0  can  be  considered  an  end -effect  correction. 

Both  formulations  have  been  used  by  researchers,  hence  it  is  appropriate  to  com¬ 
pare  their  efficiency.  This  is  shown  in  Table  X-l.  Formulation  I  has  the  following 
advantages.  Theoretically,  the  stability  of  the  resulting  filter  S( y  )  is  guaranteed, 
(although  this  is  not  true  for  implementation  with  finite  word-length  computation). 
Increasing  p  from  pQ  to  pQ  +  1  involves  only  one  additional  iteration;  therefore,  it  is 
easy  to  set  an  error  threshold  to  select  the  appropriate  value  of  p.  On  the  other  hand. 
Formulation  II  has  the  advantages  that  scaling  is  relatively  simple  for  fixed-point  imple¬ 
mentation,  and  the  computation  can  be  carried  out  in-place.  It  has  also  been  pointed 
out  that  the  square -root  method  of  solving  the  resulting  set  of  linear  equations  is 

Q 

numerically  very  stable. 
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Tabic  X-l.  Computational  efficiency  of  Formulations  1  and  II. 


Formulation  1 

Formulation  11 

Matrix 

Toeplltz 

(Can  lie  Solved  by 
Levinson's  Method**) 

Symmetric 
(C  an  lie  Solved  by 

Square-Hoot  Method8) 

Storage  (data) 

N 

N 

matrix  equation 

4p  +  4 

(p2+3p)/2 

window 

2N 

0 

Computation 

multiplies  (windowing) 

N 

0 

(compute 

pN-p- 

pN  ♦  p 

(solve  matrix  equation) 

2p^  ♦  5p-6 

(p3+9p2*2p)/6 

divides  (or  inverse) 

2f> 

P 

square-roots 

0 

P 

3.  Application  to  Sequences  Closely  Approximating  the  Hesponse 
of  an  All-Pole  Filter 

The  linear -prediction  method  is  most  suitable  for  sequences  that  can  be  closely 
approximated  as  the  response  of  an  nll-|>ole  filter.  Typically,  the  linear-prediction 
technique  is  used  to  determine  the  parameters  of  the  all -pole  filter,  nnd  spectral  anal- 
vsia  or  resynthesis  is  carried  out  by  using  these  parameters  to  generate  nn  approx¬ 
imation,  s(n),  to  the  data. 

For  problem  A  we  assumed,  by  virtue  of  Kq.  1  and  the  fact  that  the  data  are  zero 
for  n  <  0,  that  the  input  was  a  unit  sample  at  n  ■  0.  Thus,  to  generate  s(n)  from  the 
parameters,  we  excite  the  all-pole  filter  with  a  unit  sample.  For  problem  H,  the  auto¬ 
correlation  method  outlined  in  Formulation  1  suggests  the  same  procedure,  since  the 
product  of  the  data  and  the  window  is  treated  as  in  problem  A.  For  Formulation  11  In 
problem  13,  we  made  the  assumption  that  forp  *;  n  <  N  the  filter  input  Is  zero.  It  Is  use¬ 
ful  to  consider  the  result  of  applying  linear  prediction  to  data  that  do  correspond  exactly 
to  the  output  of  an  all-pole  filter  so  that  the  data  s(n)  satisfy  the  relationship 

q 

s(n)  =I  bks(n-k>  (u) 

k=l 


on  a  specified  interval,  nnd  we  choose  to  estimate  s(n)  by 


Mb 
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P 

s(n)  ■  ak«{n-k)  (12) 

iTl 

on  that  interval.  For  problem  A  the  interval  is  1  <  n  <  «  and  If  q  <  p,  s(n)Bs(n),  and  the 

coefficients  a,  will  be  equal  to  b.  for  1  <  k  <  q  and  zero  for  k  >  q.  For  problem  B,  the 
K  K  A 

autocorrelation  formulation  (Formulation  1)  will  not  in  general  give  s(n)  *  s(n),  and  the 
ak  will  not  equal  the  bR  because  the  infinite  duration  sequence  corresponding  to  the  data 
multiplied  by  the  window  no  longer  satisfies  the  relationship  of  Eq.  11.  With  q  <  p  the 
covariance  method  will  always  give  s(n)  ■  s(n)  over  the  Interval.  When  p  =  q  the  ak  arc 
uniquely  determined  and  specify  a  system  whose  unit-sample  response  Is  s(n)  *  s(n).  tor 
p  >  q.  If  the  data  satisfy  Eq.  11  for  p  «  n  <  N  but  not  for  0  <S  n  <  p,  then  we  conjecture 
that  the  ak  are  uniquely  determined.  Moreover,  the  unit -sample  response  of  the  all -pole 
filter,  s(n),  will  equal  s(n)  only  If  s(n)  corresponds  to  the  unit -sample  response  of  an 
all-pole  filter.  The  fact  .hi  t  the  specified  data  s(n)  satisfy  the  relationship  of  Eq.  11 
does  not  require  It  to  he  the  unit -sample  response  of  an  all -pole  filter  but  only  that  it 
he  the  response  to  an  input  which  Is  zero  for  p  ^  n  <  N.  Now  let  us  consider  the  ease 
for  which  the  data  satisfy  •  1 1  for  0  «  n  <  N:  that  Is.  all  of  the  specified  data  including 

the  Initial  conditions  satisfy  Eq.  11.  With  q<p  the  covariance  method  will  always  give 
s(n)  ■  s(n)  over  the  interval.  When  p  *  q  the  ak  arc  uniquely  determined  and  specify 
a  system  whose  unit-sample  response  Is  s(n)  =  s(n).  When  p  >  q.  however,  the  (pXp) 
matrix  ♦  Is  of  rank  q.  which  gives  a  p-q  parameter  family  of  solutions  for  the  ak.  For 
each  solution  vector  a  in  the  family  of  solutions,  B(n)  -  s(n),  but  one  and  only  one 
of  these  solutions  specifics  n  system  whose  unit -sample  response  Is  s(n)  «  s(n).  This 
solution  Is.  of  course,  the  one  for  which  all  of  the  ak  vanish  when  k  >  q. 

Consider  the  following  simple  example.  Let  s(n)  be  exactly  the  unit -sample 
response  of  the  filter 


S(/)  a 


1  -cj 


-1  * 


Thus  s(n)  *  onu_j(n).  Suppose  s(n)  is  estimated  by  the  second-order  linear  predictor 
7(n)  ■  BjS(n-l)  +  ajS(n-2). 


We  choose  to  minimize  the  mean-square  •"*  ror  on  the  interval  ln0.  n0+l).  using  s(nQ-l) 
and  s(no-2)  ns  starling  values  (Formulation  II).  If  nQ  >  1,  then  the  equation  ♦  a  =  4»  is 


"2 

r  i 

3* 

a  a 

ai 

8 

a 

a  1 

2 

n., 

€ 

MB 

u  w 
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Clearly,  the  matrix  £  is  singular  and  the  general  solution  for  a  can  be  expressed 
as  any  particular  solution  a°  added  to  any  linear  combination  of  vectors  spanning  the 
null  space  of  £.  We  choose  a°  =  [j]  which  is  the  particular  solution  for  which  s(n)  is 

the  unit -sample  response  of  the  filter 


Ho</> 


i 


The  general  solution  is  given  by 


u  -j  2 

where  c  is  an  arbitrary  constant.  The  solution  a  thus  lies  along  the  line  a2  =  +  a 

in  the  a  a2  plane.  To  illustrate  a  different  solution  in  the  solution  space,  suppose  c  =  1. 

Then  a.  =0,  a,  =  a2,  and  s(n)  =  s(n)  is  generated  by  the  filter 
1  2 


Hl</> 


1 


2  -2 
!-c  * 


1 


(l-aj?_1)(l+flj? 


excited  not  by  a  unit  sample,  but  by  the  sequence 


x(n)  =  UQ(n)  +  aUQ(n-l). 


We  see  that  the  pole  of  )  at  Jf.  =  -a  is  canceled  by  the  zero  at  y  =  -a  of  the  input 

sequence. 

That  the  predictor  coefficients  are  not,  in  general,  unique  is  not  surprising.  The 
linear -prediction  problem  as  formulated  seeks  to  determine  a  difference  equation,  whose 
solution  approximates  a  given  sequence  on  some  interval.  If  this  difference  equation 
is  associated  with  a  linear  system,  we  see  that  there  is  nothing  "built  into"  the  for¬ 
mulation  of  the  problem  that  specifies  the  initial  conditions  of  the  system,  that  is,  exactly 
how  the  system  was  excited.  All  that  is  required  by  the  present  formulation  of  the  prob¬ 
lem  is  that  the  input  to  the  system  vanish  over  the  interval  on  which  s(n)  is  being  pre¬ 
dicted.  Hence  the  multiplicity  of  solutions  may  be  interpreted  as  resulting  from  the 
fact  that  different  systems  with  different  inputs  can  produce  identical  outputs. 

In  practice,  we  are  not  generally  interested  in  applying  linear  prediction  to  a  sequence 
that  is  exactly  the  output  of  an  all-pole  filter  of  unknown  order.  Thus  we  do  not  expect  the 
covariance  matrix  to  be  singular  (when  it  is  singular  it  can  be  dealt  with  by  choosing 
p  =  rank  *).  We  are  interested,  however,  in  applying  linear  prediction  to  sequences 
which  may  be  modeled  approximately  as  the  output  of  an  all-pole  filter.  If  the  sequence 
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s(n)  closely  approximates  the  output  of  an  all-pole  filter  (as  speech  often  does)  the 
covariance  matrix  ®  will  be  ill-conditioned,  that  is,  almost  singular.  Although  a  unique 
solution  does  exist,  it  appears  to  be  very  sensitive  to  small  perturbations  in  the  data. 

In  particular,  if  a  small  amount  of  noise  is  added  to  the  data  which,  according  to  the 
previous  discussion,  results  in  a  family  of  solutions,  the  resulting  solution  may  be  close 
to  any  one  of  the  solutions  in  the  family,  and  as  small  perturbations  are  introduced  into 
the  data,  the  resulting  solution  may  change  radically.  A  consequence  is  that  if  the  order 
of  the  predictor  is  too  large,  and  the  data  are  close  to  the  unit-sample  response  of  an 
all-pole  filter,  as  is  often  assumed  to  be  the  case  in  speech  analysis,  the  unit-sample 
response  of  the  all-pole  filter  specified  by  the  linear -prediction  parameters  and  its 
Fourier  transform,  may  not  approximate  the  data  very  closely,  although  the  output  of 
the  filter  resulting  from  another  unspecified  input  will. 

4.  Summary  and  Conclusion 

We  have  attempted  to  point  out  the  major  differences  between  the  various  formula¬ 
tions  of  the  linear -prec'.iction  problem  and  discuss  a  set  of  important  issues  related 
to  linear  prediction.  W e  have  seen  that  there  are  generally  two  different  methods 
of  formulating  this  problem;  one  requires  s(n)  to  be  zero  outside  the  domain  of  mini¬ 
mization,  and  the  other  does  not.  The  autocorrelation  method  provides  a  good  match  to 
the  spectrum,  but  this  is  by  no  means  an  indication  of  its  superiority  over  the  covari¬ 
ance  method. 

The  uniqueness  of  the  linear -prediction  solution  is  a  very  important  issue.  Our 
experience  indicates  that  with  additive  noise  injected,  the  system  does  not  always  con¬ 
verge  to  a  desirable  answer.  What  perceptual  effect  this  has  on  speech  synthesis  is 
still  unclear.  We  hope  to  answer  this  question  better  after  experimental  speech  syn¬ 
thesis. 

M.  R.  Portnoff,  V.  W.  Zue,  A.  V.  Oppenheim 
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